Multi-Attribute Factorized Hidden Layer Adaptation for DNN Acoustic Models
نویسندگان
چکیده
Recently, the Factorized Hidden Layer (FHL) adaptation is proposed for speaker adaptation of deep neural network (DNN) based acoustic models. In addition to the standard affine transformation, an FHL contains a speaker-dependent (SD) transformation matrix using a linear combination of rank-1 matrices and an SD bias using a linear combination of vectors. In this work, we extend the FHL based adaptation to multiple variabilities of the speech signal. Experimental results on Aurora4 task show 26.0% relative improvement over the baseline when standard FHL adaptation is used for speaker adaptation. The Multiattribute FHL adaptation shows gains over the standard FHL adaptation where improvements reach up to 29.0% relative to the baseline.
منابع مشابه
Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models
Using auxiliary input features has been seen as one of the most effective ways to adapt deep neural network (DNN)-based acoustic models to speaker or environment. However, this approach has several limitations. It only performs compensation of the bias term of the hidden layer and therefore does not fully exploit the network capabilities. Moreover, it may not be well suited for certain types of...
متن کاملLearning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models
Factorized Hidden Layer (FHL) adaptation has been proposed for speaker adaptation of deep neural network (DNN) based acoustic models. In FHL adaptation, a speaker-dependent (SD) transformation matrix and an SD bias are included in addition to the standard affine transformation. The SD transformation is a linear combination of rank-1 matrices whereas the SD bias is a linear combination of vector...
متن کاملRapid adaptation for deep neural networks through multi-task learning
We propose a novel approach to addressing the adaptation effectiveness issue in parameter adaptation for deep neural network (DNN) based acoustic models for automatic speech recognition by adding one or more small auxiliary output layers modeling broad acoustic units, such as mono-phones or tied-state (often called senone) clusters. In scenarios with a limited amount of available adaptation dat...
متن کاملA study of speaker adaptation for DNN-based speech synthesis
A major advantage of statistical parametric speech synthesis (SPSS) over unit-selection speech synthesis is its adaptability and controllability in changing speaker characteristics and speaking style. Recently, several studies using deep neural networks (DNNs) as acoustic models for SPSS have shown promising results. However, the adaptability of DNNs in SPSS has not been systematically studied....
متن کاملRobust i-vector based adaptation of DNN acoustic model for speech recognition
In the past, conventional i-vectors based on a Universal Background Model (UBM) have been successfully used as input features to adapt a Deep Neural Network (DNN) Acoustic Model (AM) for Automatic Speech Recognition (ASR). In contrast, this paper introduces Hidden Markov Model (HMM) based ivectors that use HMM state alignment information from an ASR system for estimating i-vectors. Further, we ...
متن کامل